Chapter 6 

Gabor Representations 


6.1 Introduction 

This is the last chapter of general background material before turning to 
the topic of field computation proper, which occupies the remainder of the 
book. The issue is the representation of of continuous fields (images, signals) 
extended in one or more continuous dimensions, including time. We begin 
with a fundamental way of quantifying the information carrying capacity of 
a signal, which was developed by Gabor and is complementary to Shannon’s 
better known measure. This has interesting, mathematically rigorous, con¬ 
nections to the Heisenberg uncertainty principle and to wave-particle duality, 
which are important for applications in quantum computation. Interestingly, 
Gabor-like representations seem to be used by the brain, especially in pri¬ 
mary visual cortex, and so we review the evidence for this. In any case, 
Gabor wavelets have proved to be valuable multi-resolution representations 
in many practical applications. While all the mathematical essentials are 
here (especially in the appendices to the chapter), our principal goal is to 
build intuition for the material. 

Dennis Gabor is best known as the father of holography, in recognition 
of his development of its theory in 1947. In this chapter, however, we are 
concerned with his theory of communication, published in 1946 (Gabor 1946), 
two years before Claude Shannon’s more famous theory (Shannon 1948). 
Gabor’s theory was not simply an anticipation of Shannon’s (as was Hartley’s, 
for example); rather it addresses a completely different aspect of the nature of 
communication. It also provides a basis for the representation and processing 
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of information in vision and perhaps other sensory modalities. This aspect 
will be our concern here. 

First we review Gabor’s Uncertainty Principle, which defines limits on the 
representation of any signal. Next we discuss the representation of signals in 
terms of Gabor elementary functions (Gaussian-modulated sinusoids), which 
is optimal in terms of the uncertainty principle and has several advantages 
over representations based on Fourier series and the Sampling Theorem. Af¬ 
ter reviewing John G. Daugman’s research supporting the presence of Gabor’s 
representations in mammalian vision, we discuss its pros and cons compared 
to wavelet-based representations. Finally we present an extension of the 
Gabor representation and apply it to the representation and processing of 
spatiotemporal patterns in the visual cortex. 

6.2 The Gabor Uncertainty Principle 

Gabor proved his uncertainty principle by applying to arbitrary signals the 
same mathematical apparatus as used in the Heisenberg-Weyl derivation of 
the uncertainty principle in quantum mechanics. We give our own version of 
this proof in the chapter appendix (Sec. 6.12.1, p. 144); here our intent is to 
build intuition, so we present several informal derivations. 

Suppose we are trying to measure the frequency of a tone. Intuitively, 
the longer the sample we take, the more accurate will be our measurement 
(Fig. 6.1), which suggests that the error in measuring the frequency, A/, is 
inversely related to the duration of the measurement, At. This intuition can 
be made a little more precise by considering a very basic kind of frequency 
measurement. Suppose we have a device that counts every time our signal 
reaches a maximum; then the number of maxima in an interval of time At 
will be the average frequency during that interval (Fig. 6.2). How long must 
At be in order to guarantee we can distinguish frequencies differing by A/? 
This will occur when the counts for / and / + A/ are guaranteed to differ 
by at least one (Fig. 6.3). That is, 

(/ + Af)At — f At > 1, 

or, 

A/At > 1. (6.1) 

This is the basic Gabor Uncertainty Principle ; it means that the product of 
the uncertainties in frequency and time must exceed a fixed constant, and 



6.2. THE GABOR UNCERTAINTY PRINCIPLE 


103 





Figure 6.1: Improved frequency measurement over longer time intervals. The 
uncertainty in the frequency A/ decreases as the measurement interval At 
increases, and vice versa. 
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Figure 6.2: Measuring frequency by counting maxima in a given time interval. 
The circled numbers indicate the maxima counted during the measurement 
interval At. Since signals of other frequencies could also have the same num¬ 
ber of maxima in that interval, there is an uncertainty A/ in the frequency. 



Figure 6.3: Minimum time interval At to detect frequency difference A/. 
If two signals differ in frequency by A/, then a measurement of duration 
At > 1 / A/ is required to guarantee a difference in counts of maxima. Italic 
numbers indicate maxima of signal of frequency /; roman numbers indicate 
maxima of signal of higher frequency / + A/. 
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so the accuracy with which one of them can be measured limits the best 
possible accuracy with which the other can be measured. 1 

Heisenberg’s Uncertainty Principle is a simple corollary of Eq. 6.1, since 
according to quantum mechanics the energy of a photon is proportional to 
its frequency, E = hf. Multiplying both sides of Eq. 6.1 by h (Planck’s 
constant) yields 

AEAt > h, 

which is one form of Heisenberg’s principle. 2 Of course, Heisenberg derived 
his principle first; Gabor’s accomplishment was to show that the same math¬ 
ematical derivation applied to communication systems. 

A more formal derivation of Gabor’s Uncertainty Principle is based on 
the observation that the “spread” of a signal and its Fourier transform are 
inversely proportional (Fig. 6.4). 3 To accomplish this we must first specify 
a way of measuring the spread of functions, especially when they are not 
strictly local (i.e., have noncompact support). For suppose we measure a 
frequency / over an interval of time At] this does not imply that the fre¬ 
quency during that interval was always in the range / ± A/; it means only 
that the average frequency over that interval was in / ± A/. The instanta¬ 
neous frequency could have varied widely, and so its spectrum might look like 
that in Fig. 6.5 (assumed to be centered on /). Nevertheless, we can assign 
a nominal bandwidth to the spectrum that measures its spread around the 
measured frequency /. Alternately we can imagine that Fig. 6.5 represents 
the transfer function of a band-pass filter; the nominal bandwidth is a mea¬ 
sure of the width of the band compared with that of an ideal band-pass filter. 

1 Note that we have shown a duration At > 1 / A/ is necessary to discriminate frequen¬ 
cies differing by A/. On the other hand, if we measure a frequency / during an interval 
At, then the actual frequency could be as low as / — 1/At or as high as / + 1/At. There¬ 
fore, the uncertainty around / is A/ > 2/At, giving the uncertainty principle A/At > 2. 
Furthermore, there are other methods of measuring the frequency, such as counting sign 
changes (zero crossings), which would give A/At >1/2 for the discrimination case and 
A/At > 1 for the measurement case. Thus although the exact constant depends on what 
we are measuring and how we are measuring it, its value doesn’t much matter, since the 
conclusion is the same: there is a lower limit on the product of the uncertainties in the 
time and frequency domains. For the sake of simplicity we use the constant 1. 

2 Different methods of measuring A E and At yield different constants on the right- 
hand side, such as h = h/2n or h/2. Again, the exact constant doesn’t matter. Also, 
since E = pv/2 = px/2 f, we have the other common form of the Heisenberg principle, 
ApAx > 2 h. 

3 The derivation follows Yu (1976, pp. 44-45). 
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Time Domain 


a 


Frequency Domain 


b 



c 
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Figure 6.4: The “spread” of a signal and its Fourier transform are inversely 
proportional, (a) A constant function in the time domain corresponds to a 
unit impulse (Dirac delta function) in the frequency domain, (b, c) As the 
width of a pulse in the time domain decreases, its spectrum in the frequency 
domain spreads (spectrum shown is schematic), (d) A unit impulse in the 
time domain has a spectrum which is a constant function. 
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Figure 6.5: Nominal bandwidth in frequency domain of nonnegative spec¬ 
trum. The nominal bandwidth is the width of a rectangular pulse (shaded) 
that has the same area as the continuous spectrum and has a height equal 
to its amplitude at the origin. 


We say that nominal bandwidth measures the spectrum’s localization in the 
frequency domain. Similarly, to a signal that may not actually be localized 
in a particular interval of time, we assign a nominal duration that measures 
its spread in time, and thus its localization in the time domain. 4 

Although there are many ways to define these measures, we define the 
nominal duration of a nonnegative signal 0 to be the duration of a rectangular 
pulse of the same area and amplitude at the origin as the signal (Fig. 6.6). 5 
Thus the nominal duration At is defined by the equation 

4 We call a function local if most of its area is concentrated in a compact region; we call 
it strictly local if it has compact support (roughly, it is zero outside of a compact region). 
For example, the normal distribution is local, but a finite pulse is strictly local. Note 
that we can have a local function that is in fact more localized than a given strictly local 
function. We call a function nonlocal if its area is spread more or less uniformly over its 
(noncompact) domain; sine and cosine are good examples. 

5 General (possibly negative) signals are considered later. Obviously there are many 
ways to measure the spread of a function, for example, Gabor (1946) uses the variance, 
as does Hamming (1989, pp. 181 -184); in Appendix 6.12.1 we use the standard deviation. 
The choice of measure affects only the constant on the right-hand side of the uncertainty 
relation. 
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Figure 6.6: Nominal duration in time domain of nonnegative signal. The 
nominal duration is the duration of a rectangular pulse (shaded) that has 
the same area as the signal and has a height equal to its amplitude at the 
origin. 


At 0(0) = / 4>{t)dt (0(f) > 0). 


( 6 . 2 ) 


Similarly, the nominal bandwidth of the Fourier transform of 0, <3> = is 

defined 

/ OO 

*(/)d/ (<&(/) > 0). (6.3) 

-OO 

Next write $(0) as the Fourier transform of 0 evaluated at / = 0: 


$(0) = / 0(f)e 2,ri/t df 


0(t)dt = At 0(0). 


/=o 


Therefore, 


At = 


4(0) 


(6.4) 


Similarly, applying the inverse Fourier transform, 


0(0) = / Hf)e Mft df 


*(/)d/ = A/$(0). 


t =0 J 
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Figure 6.7: Nominal duration in time domain of arbitrary signal. Signal 
shown as solid line, absolute value of signal shown as dashed line. The 
nominal bandwidth of a spectrum is the width of a rectangular pulse (shaded) 
that has a height equal to the spectrum’s amplitude at the origin, and that 
has the same area as the absolute value of the spectrum. 


Therefore, 


A/ 


m 

*( 0 )' 


Multiplying Eq. 6.4 by Eq. 6.5 yields 


A/At 


m m 

$( 0 ) 0 ( 0 ) 


1 . 


(6.5) 


( 6 . 6 ) 


Thus we see that the nominal duration and nominal bandwidth are recip¬ 
rocals of each other, provided the signal and its Fourier transform are both 
nonnegative. In other words, there is a minimum possible simultaneous lo¬ 
calization of the signal in the time and frequency domains. 

Now we consider the general case, in which the signal and its Fourier 
transform may take on negative values. This is accomplished by defining the 
nominal spreads in terms of the absolute values of the functions (Fig. 6.7): 


At |0(O) | = 



|0(f) |df, 


(6.7) 
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Af\m\= mm/. ( 6 . 8 ) 

J —oo 

The absolute value weakens our previous equality to an inequality: 


At|0(O)| = / \(f>(t)\dt > 


A/|$(0)j = / |*(/)|d/ > 


0(t)dt 




= l*(0)|, 


These equations give bounds on the nominal spreads in terms of the signal 
and its transform at the origin: 


A /> 


\m\ 

|$( 0 )|’ 


At > 


\m\ 

10 ( 0 ) 1 ' 


From these equations we get the general Gabor Uncertainty Principle: 


A/At > 1. (6.9) 

It should be noted that such an uncertainty principle applies whenever we 
make simultaneous measurements of a function and its Fourier transform. 6 

The implications of Gabor’s principle are easier to understand by looking 
at it in “Fourier space,” where the abscissa reflects the time domain and the 
ordinate the frequency domain (Fig. 6.8). Then Gabor’s principle says that 
the spreads or uncertainties in the time and frequency measurements must 
define a rectangle in Fourier space whose area is at least 1. Thus we can 
decrease At, and so localize the signal better in the time domain, or decrease 
A/, and so localize it better in the frequency domain, but we cannot localize 
it arbitrarily well in both domains simultaneously. The most we can localize 
signals in the Fourier domain is into rectangles of size Af At = 1. 


6.3 Gabor Representation of One-Dimensional 
Signals 

Suppose we are transmitting information by sending signals of various fre¬ 
quencies of bandwidth F during an interval of time T . 7 Suppose we sample 


6 In the language of quantum mechanics, / and t are conjugate variables. 

7 As Gabor notes, all real, physical signals have finite bandwidth and duration. 
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Figure 6.8: Minimum possible localization of signal in Fourier space. The 
product of the nominal duration At and nominal bandwidth A/ of a signal 
must be at least 1. 
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Figure 6.9: A band-limited signal is defined by a fixed number of elemen¬ 
tary information cells. The most efficient possible (noiseless) channel divides 
Fourier space into information cells of size A/At = 1. Each cell contains one 
logon of information. 


the signal over intervals of length At to determine the signal strength at 
various frequencies (say, through a bank of band-pass filters). Then the clos¬ 
est frequencies we will be able to distinguish will be given by A/ = 1/At; 
that is, any frequencies differing by less than this A/ will be operationally 
indistinguishable. Thus our measuring apparatus divides Fourier space into 
information cells of size A/At > 1 (Fig. 6.9). Since the most efficient possi¬ 
ble (noiseless) channel will have A/At = 1, the number of such elementary 
information cells determines the maximum amount of information that can 
be transmitted. No matter how we divide up Fourier space, its area gives the 
number of elementary information cells, and thus the number of independent 
quantities that can be transmitted. For example, in the simple case where 
T = MAt, F = NAf and AfAt = 1, we are able to transmit MN inde¬ 
pendent quantities. For this reason Gabor defined a AfAt = 1 rectangle in 
Fourier space to be the basic quantum of information and called it a logon. 
Thus any device (of the given bandwidth) can transmit at most MN logons 
of information (in the given time interval). 8 

s The reader will wonder how Gabor’s measure of information relates to Shannon’s; in 
fact they are orthogonal. Gabor’s measure, which may be called structural information, 
quantifies the number of possible degrees of freedom. Shannon’s measure, which may 
be called metrical or selective information, quantifies the decrease in a priori uncertainty 
in a single one of these degrees of freedom. For example, in an optical device, the re¬ 
solving power is equivalent to the logon content or structural information, whereas the 
logarithm of the number of discriminable brightness levels is equivalent to selective or 
Shannon information. Both notions of information are necessary for a complete theory 
of communication (MacKay 1969, pp. 178-180, 186-189; Cherry 1978, pp. 47-49). Re- 
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Figure 6.10: The Gaussian-modulated complex exponential, or Gabor ele¬ 
mentary function <f>^. The t axis goes from left to right through the center 
of the spiral. The imaginary axis is vertical; the real axis is horizontal, per¬ 
pendicular to the other two axes. In this case j — 0 (no displacement from 
origin), k — 1, A/ = 1 and a 2 = 20. The function is plotted from t = —6 to 
t — 6. 


Gabor also showed that the minimum area in Fourier space is achieved by 
Gaussian-modulated complex exponential functions of the form (Fig. 6.10): 

<f> jk (t) = exp[— n(t — jAt) 2 /a 2 } exp[2rrik A/ (t — j At)], (6.10) 

where A/At = l. 9 Notice that the first factor leads to a Gaussian envelope 

markably, by 1928 Hartley had anticipated Gabor and Shannon by suggesting that the 
information transmittable over a channel is proportional to MNlogS , where S is the 
number of discriminable power levels (Cherry 1978, p. 47). 

9 Although Gabor showed that these functions occupy minimum area in terms of his 
(variance-based) definition of nominal spread, they also do so in terms of the definitions 
in Eqs. 6.7 and 6.8. 
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Figure 6.11: The Gaussian cosine function Cjk, or even-symmetric component 
of (j)jk- In this case j — 0 (no displacement from origin), k — 1, A/ = 1 and 
a 2 = 20 . 


centered on jAt, and the second factor is the conjugate exponential form of 
the trigonometric functions of frequency kAf. The parameter a determines 
the locality (spread) of the Gaussian envelope; it is proportional to its stan¬ 
dard deviation. So we have a periodic function modulated by a Gaussian 
envelope, a coherent state or wave packet in the terminology of quantum me¬ 
chanics. This can be seen more clearly by using Euler’s formula to rewrite 
Eq. 6.10 in terms of the cis (cosine + i sine) function and then in terms of 
the sine and cosine functions: 

<t>jk(t ) = exp[—7r(t — jAt) 2 /a 2 ]cis[2rrkAf (t — j At)], 

= exp[—7r(t — jAt) 2 /a 2 ] cos[2^/cA/ (t — j At)] + 
i exp[— 7t(£ — jAt) 2 /a 2 } sin[2/rfcA/ (t — j At)], 

Thus the Gabor elementary function is the sum of the Gaussian cosine and 
Gaussian sine functions (Figs. 6.11 and 6.12). If we let Cjk and Sjk represent 
the Gaussian cosines and sines: 


Cjkit) = exp[—7r(t — jAt) 2 /a 2 ] cos[2r/cA/ (t — j At)], (6.11) 

Sjk(t) = exp[—7r(t — j'At) 2 /o: 2 ] sm[2rrkAf (t — j At)], (6.12) 
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Figure 6.12: The Gaussian sine function Sjk, or odd-symmetric component 
of (f)]k- In this case j — 0 (no displacement from origin), k — 1, A/ = 1 and 
a 2 = 20 . 


then (j)jk Cj-\- iSj k . 

So far we have had little to say about the coefficients associated with the 
elementary information cells; this is the topic we now address. Suppose a 
rectangular region of Fourier space is divided into MN elementary informa¬ 
tion cells, and that if is a signal whose duration and bandwidth are confined 
to that region. For simplicity we assume that the cells are centered on fre¬ 
quencies / = 0, A/, 2 A/,..., {N — 1)A/, and on times t — 0, At, 2 At,..., 
(M — l)At (Fig. 6.13). Gabor showed that any such (finite energy) if can be 
represented as a linear superposition of Gaussian sinusoids: 

M—1 N— 1 

Q'jk.Cjk bjkSjk- (6.13) 

j =0 k =0 

Each Gaussian cosine Cjk or sine Sjk is localized in the cell centered on time 
j At and frequency k A/; we call j the cell’s time-interval quantum number 
and k its frequency-band quantum number. The real coefficients ajk and bjk 
show the amplitudes of Gaussian cosines and sines in each cell. 

It would appear that is determined by 2 MN real coefficients, but since 
Sjo = 0, only 2 MN — M of the coefficients are independent, as we can see 




116 


CHAPTER 6. GABOR REPRESENTATIONS 


f 



0 At 2 At • • • (M-l)At 

Figure 6.13: Representation of band-limited, finite-length signal by ele¬ 
mentary information cells centered on frequencies / = 0, A/, 2 A 
( N — 1)A/, and on times t = 0, At, 2 At,..., (M — l)At. Cells are indexed 
by frequency-band quantum numbers k — 0,1,..., N — 1 and time-interval 
quantum numbers j — 0,1,.,,, M — 1. 
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by writing Eq. 6.13 in the form: 

M—l ( N—l 

^ \ a j()Ojo -|- ^ ^ a jkC :l k T bj k Sj k 

j= 0 L k= 1 

The parameters Ojo determine the DC value of 0 in each of the M time 
intervals. 

Just as is done in Fourier series, we can express 0 as a complex series, 

M-l N-l 

^ c 3k4>jk, (6.14) 

j= 0 k=—N+ 1 

where the complex coefficients are given by: 

Co Co ■ 

Cjk ifljk ^bjkj / 2 , k 0 , 

Cjk (&jk T i'bjk ) /J^ ^ 0. 

Notice that = c* fc , the complex conjugate of Cj^ k - We omit the derivation 
of the complex series as it can be found in any standard textbook on Fourier 
series. 

Although there are M(2N — 1) complex coefficients Cj k , we have seen 
that M(N — 1) of them are complex conjugates of the others, and thus 
are not independent. Out of the remaining MN complex coefficients (one 
corresponding to each elementary information cell), the M coefficients Cj o are 
real, so once again we find that the signal is determined by 2 MN — M real 
values. Thus Gabor has shown that a signal of duration T and bandwidth F 
has T{2F — A/) (real) degrees of freedom, and is thus capable of conveying 
M{2N — 1) independent real values. 10 

Gabor’s measure of information is consistent with the number of degrees 
of freedom given by the Sampling Theorem (Shannon, 1948). To see this, 
observe that the highest frequency elementary information cells are centered 
at frequency ( N — 1)A/; therefore their maximum frequency (as defined by 
their nominal spread) is / m = ( N — 1/2)A/. The Sampling Theorem says 

10 Note that FT = (M Af)(N At) = MN(AfAt) = MN. Another way to interpret 
the formula M(2N — 1) is that for each time interval and frequency band we have two real 
parameters — an amplitude and a phase — except for the DC band, which has only an 
amplitude. 
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that to reconstruct if we must take equally spaced samples at a minimum of 
the Nyquist frequency , which is twice the maximum frequency. Therefore in 
time T the number of samples we must take is: 

2 f m T = 2 (N - 1/2) A/ T ={2N - 1) A/ At M = (2 N - 1 )M. 

So Gabor’s analysis and Shannon’s Sampling Theorem both show that (2 N — 
1 )M real parameters determine a signal of duration T and bandwidth F. 

We can also compare these results with the representation of the signal 
by a finite Fourier series. To do this we treat the signal if as periodic with 
period T ; then its highest frequency relative to this period is 

H = f m T = (N - 1/2)A/(M At) = (2 N - l)M/2. 

The signal can be represented exactly by an H + 1 term Fourier series: 

H 

V’W = d n cos(2 nnt/T) + e n sin(2/ xnt/T). 

n =0 

There appear to be 2 (H + 1) parameters d n , e n , but e 0 is irrelevant since the 
corresponding sine term is identically zero. It also can be shown that e n is 
irrelevant, since (by the Sampling Theorem) the signal is determined by 2 H 
points, and over these points the last sine term is linearly dependent on the 
other terms. Thus if can be represented by a Fourier series determined by 
2 H = (2 N — 1 )M real parameters: 


H -1 

if(t) = do + {d n cos(2rxnt/T) + e n sin(2/ xnt/T)} + du cos(2 nHt/T). 

n= 1 

Again, the band-limited, finite-length signal is seen to have (2 N — 1 )M real 
degrees of freedom. 11 

We have seen that any band-limited signal of finite duration can be rep¬ 
resented by a finite superposition of Gabor elementary functions. This raises 
the question of whether arbitrary functions can be represented as (possibly 

11 Some authors argue that the logon content is MN + 1 complex parameters or (2 N — 
1 )M + 1 real parameters; it is also possible to argue an extra degree of freedom in the 
Fourier and Sampling representations. The practical difference is slight, since typically 
MN 1, but the issue is important for information theory (MacKay 1969, pp. 185-186; 
Brillouin 1956, p. 97). We leave it unresolved. 
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infinite) superpositions of the Gabor functions. In fact it can be shown (Hcil 
& Walnut 1989, pp. 656-657) that the set of Gabor functions 4>jk is complete 
in L 2 (M), the set of square-integrable functions. That is, any signal ^ of 
finite energy can be written as an infinite sum 

OO OO 

0 = Cjk4>jk- 

j=—oc k =—oo 

Equivalently, 

OO 

0 = 5Z (ijkGjk T bj k Sj k . 
j,k =o 

(Note Sj 0 = 0, so the bj 0 are irrelevant.) On the other hand, the Gabor 
elementary functions do not form a basis for the L 2 (M) functions, an issue 
addressed later (Sections 6.6 and 6.9). 

There is another way to understand the relation between representations 
based on Gabor elementary functions, Fourier series, and the Sampling The¬ 
orem (Gabor 1946, p. 435). Notice that as a —> oo the Gabor functions 
become 

<) = exp[2/ri/cA/(t — j At)], 

Cjk{t) = cos[27r/cA/(t — j At)], 

S jk (t) = sm[2nkAf (t — j At)]. 

That is, in the a = oo limit the wave packets have no locality, and the Gabor 
representation reduces to the Fourier representation, sinusoids at a spacing 
A/. Conversely, as a —> 0 the wave packets become more and more localized, 
and in the limit pass over into Dirac delta functions (impulses) at a spacing 
of At: 


<t>jk(t) = S(t - jAt) +iS(t - jAt% 

C jk (t) = S jk (t ) = 5{t-jAt). 

We see that the a = 0 limit represents two samples ( dj k and bp.) for each At 
interval, as required by the Sampling Theorem. 

The value of the Gabor representation lies in the locality of the elementary 
functions. That is, although they are not strictly local (of compact support), 
their sensitivity is concentrated in a small interval of time (measured by the 
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nominal duration). Because of the locality property, the Gabor represen¬ 
tation is physically more realistic than the Fourier representation, since it 
represents a band-limited signal of finite duration (i.e., a physically realistic 
signal) by a finite superposition of temporally local elementary signals. In 
contrast, the Fourier representation of such a signal requires an infinite su¬ 
perposition of nonlocal signals, and so depends on enormous cancellation in 
order to result in a local superposition (Strang 1989, p. 614). This and the 
fact that the Gabor elementary functions correspond to a quantum of infor¬ 
mation are good theoretical reasons for choosing them as representational 
primitives. 

6.4 Gabor Representation of Two-Dimensional 
Signals 

We have seen that any (finite energy) one-dimensional function : M —l M 
can be represented as a linear superposition of Gabor elementary functions, 
each of which represents one logon or quantum of information about the 
function. Although we thought of these functions as time-varying signals 
■0(f), it should be clear that this is not essential to the theory. ip(x) could 
also represent a spatial pattern, in which case the Gabor elementary functions 
represent information cells localized in space and spatial frequency. We must 
make the change to the spatial domain when we come to problems in vision, 
where it is necessary to consider two-dimensional functions if} : M 2 —> M, 
where ^(x^y) represents the intensity at spatial location ( x,y ). 

It might be expected that two-dimensional signals could be represented 
in terms of two-dimensional analogues of Gabor elementary functions, and 
in the early 1980s a number of researchers suggested Gaussian-modulated 
sinusoids as models of the receptive fields of simple cells in visual cortex 
(Marcelja 1980; Daugman 1980; Watson 1982; Pribram & Carlton 1986). 
Our presentation is based on Daugman (1985a, 1993). 

Daugman proved 12 two-dimensional analogues of Gabor’s Uncertainty 
Principle, 

AxAu > 1 / 4n, AyAv >1/47r 

(where A u and Av are the uncertainties in the x and y spatial frequencies), 
and showed that the elementary information cells are occupied by Gabor 

12 Our own proofs can be found in the chapter appendices 6.12.1 and 6.12.2. 
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elementary functions of the form: 


y pquv 


(x, y) = exp -7T 


(x-p) 2 ( y-q ) 2 

O I 


cr 


(3* 


exp{2 ni[u(x — p) + v(y — g)]}. 

(6.15) 


The first factor is a two-dimensional Gaussian distribution centered on the 
point (p, g); the second factor is the conjugate exponential form of the trigono¬ 
metric functions, also centered on ( p , q). The parameters ( u , v) determine the 
wave packet’s location in the frequency domain just as (p, q) determine its lo¬ 
cation in the spatial domain. The 2D Gabor function’s nominal rr-spread and 
p-spread are a and (3, and so these parameters determine its two-dimensional 
shape and spread. 13 

As we did for the Gabor representation of ID signals, we will make use of 
Gabor elementary functions located on a regular grid in the spatial and spec¬ 
tral domains. In this case we index the functions by the quantum numbers 
j, k, l, m: 


&jklm(x , y) 


exp < —7T 


(x-j Ax) 2 ( y-kAy) 

O I 


a* f3 2 

exp{27r?'[Z A u(x — j Ax) + m A v(y — 



kAy)}}, 


where the spacing is determined by Ax Au = 1 and Ay Av = 1. 

The spatial frequency of the function in Eq. 6.15 is / = a/m 2 + v 2 and its 
orientation is 6 = arctan(u/w). Conversely, u = / cos 9 and v = fsinO. This 
gives an alternate form for the elementary functions: 

4> pq fe{x,y) = expj-vr 

exp{27r if[{x — p) cos 9 + (y — q) sin 0] }. (6.16) 


(x-p) 2 + (y~ q) 21 


OL 


P 2 


X 


The structure of Eq. 6.15 may be easier to understand by writing it in 
vector form; let x = (x, y) be an arbitrary point in the plane, let p = (p, q) be 
the center of the function, let u = (u, v) be the wave vector (which represents 
the packet’s frequency along each axis). Finally, let the diagonal matrix 


S 


a. 1 0 \ 

o r 1 J 


13 The standard deviation of the Gaussian on the x-axis is proportional to a, and on the 
y-axis to /3; see Sec. 6.12.2. 
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represent the function’s shape. Then, 


0 pu (x) = exp{— 7t||5(x - p)|| 2 }exp[27riu • (x - p)]. (6.17) 

Now it is clear that the 2D Gaussian envelope falls off with the square of the 
distance from p scaled in accord with S. Similarly, the periodic part has its 
origin at p. Since u • (x — p) projects x — p onto u, the phase of the periodic 
function is constant in a direction perpendicular to the wave-vector u. Thus 
the orientation of the periodic part is given by u and its frequency is given 
by ||u|| = /. 

The overall shape of the 2D Gabor elementary function is easiest to un¬ 
derstand in terms of its even-symmetric (cosine) and odd-symmetric (sine) 
components, so we separate the periodic part of Eq. 6.17 to get 0 pu (x) = 
C P u + hSpu, where 

C'pu(x) = exp{—7r||S'(x — p)|| 2 } cos [2m • (x — p)], 

-S'pu(x) = exp{-7r||S'(x - p)|| 2 }sin[2n-u • (x - p)]. 


One of these 2D Gaussian sinusoids is shown in Fig. 6.14; it can be described 
as an oriented grating patch. The 2D Gabor Uncertainty Principle can be 
understood by looking at Fig. 6.15. On the left we see a schematic represen¬ 
tation of the even component of a 2D Gabor elementary function; on the right 
we see a schematic representation of its Fourier transform. Now consider Fig. 
6.16, which shows a Gabor function like that in Fig. 6.15, but wider in the x 
direction. Looking at the figure we can see that its increased width will pro¬ 
vide greater sensitivity to orientation, and this can be seen in the frequency 
domain, where A 6 « l/(/Aa;) is smaller. 14 Thus there is a tradeoff between 
localization in the conjugate variables x and 6 , since Ax Ad « 1/f. Figure 
6.17 shows the effect of stretching the Gabor function in the y dimension. 
Just as for one-dimensional signals, the increased number of samples allows a 
more accurate determination of the frequency, and so decreases A/ = 1/ Ay. 
Thus there is a tradeoff between localization in the conjugate variables y and 
/, since AyAf = 1. 


14 The relationship A 0 « 1 / fAx holds for small Ad. 
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Figure 6.14: The even (cosine) component of a 2D Gabor elementary func¬ 
tion. The function shown has a 2 = /3 2 = 20, u = 1/2, v = 1, and p — q — 0. 
It is plotted for all x, y e [—6, 6]. 



Figure 6.15: Schematic representation of even component of 2D Gabor ele¬ 
mentary function in space and frequency domains (adapted from Daugman 
(1985b)). 
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Figure 6.16: Schematic representation of even component of 2D Gabor ele¬ 
mentary function in space and frequency domains showing Ax vs. A 6 tradeoff 
(adapted from Dangman (1985b)). Thus x and 6 are conjugate variables. 



Figure 6.17: Schematic representation of even component of 2D Gabor ele¬ 
mentary function in space and frequency domains showing Ay vs. Af trade¬ 
off (adapted from Dangman (1985b)). Thus y and / are conjugate variables. 
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6.5 Evidence for 2D Gabor Representation in 
Vision 

Daugman (1984, 1985a, 1993) summarizes the considerable physiological evi¬ 
dence that 2D Gabor elementary functions are fundamental to visual process¬ 
ing in several mammalian species. Since proponents of alternate hypotheses, 
such as those based on wavelets or Laplacian edge detectors, will have to 
demonstrate that they can account as well for these data, we briefly review 
the results. 

First, measurements of the receptive fields of simple cells in cat visual 
cortex have shown them to be like Gaussian-modulated sinusoids (Jones & 
Palmer 1987); Daugman has shown that 97% of them are statistically indis¬ 
tinguishable from the odd- or even-symmetric parts of a 2D Gabor elementary 
function. 

Pollen and Ronner (1981) found a quadrature phase relation between 
pairs of simple cells in the same cortical column; that is, adjacent simple 
cells have grating patches that are 90° out of phase, but matched in preferred 
orientation and frequency. These cells could be computing the odd- and even- 
symmetric parts of the complex 2D Gabor function, in accord with Euler’s 
formula: 

e 2 m fx = cos 2k fx + i sin 2 nfx. 

Alternately, a simple cell could represent the sum of two Gabor elementary 
functions, in accord with the formulas: 

2 cos 2k fx = e-toifx + etoifx, 

2 sin 2k fx = ie~* ifx - ie* ifx . 

Daugman (1993) presents an additional argument in favor of the 2D 
Gabor elementary functions, which is based on their efficiency. An opti¬ 
mal image coding scheme is given by principal components analysis via the 
Karhunen-Loeve transform. However, this method is dependent on the par¬ 
ticular image to be encoded. To get an image-independent encoding scheme, 
we can make the reasonable assumption that the image statistics are locally 
stationary, in which case the Karhunen-Loeve transform is equivalent to a 
windowed Fourier analysis in each of the local regions. The 2D Gabor rep¬ 
resentation is a good approximation to this scheme. 

Daugman (1984) also conducted a series of psychophysical experiments, 
which allowed him to infer the tuning sensitivities of the entire visual chan- 
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nel (in humans). These were masking experiments in which a fixed grating 
was presented together with a grating that was variable in orientation and 
spatial frequency. The experiments determined how much the fixed grating 
interfered with the perception of the variable grating; this was determined 
by measuring how much the fixed grating raised the threshold at which the 
variable grating became visible. The underlying assumption is that the neu¬ 
rons in the visual channels that are involved in perceiving the fixed grating 
become fatigued or saturated, and so the variable grating is difficult to per¬ 
ceive to the extent it shares the same neural channels. Hence the degree of 
masking measures the response sensitivity of that neural channel to gratings 
of other frequencies and orientations. 

The results of these psychophysical experiments were consistent with the 
neurophysiological data from cat visual cortex: visual channels have a fre¬ 
quency bandwidth of 1-2 octaves and an orientation half-bandwidth ±15° 
(i.e., 30° total angular bandwidth). Furthermore, the regions of sensitivity 
in the spectral domain were elliptical and twice as large in the orientation 
dimension as in the frequency dimension (i.e., corresponding to Fig. 6.17). 
Such a sensitivity profile corresponds to a width/length ratio in the spatial 
domain of A = a//3 — 1/2, in good agreement with neurophysiological data 
(Jones & Palmer 1987; Movshon 1979). 

For optimal (minimum uncertainty) 2D Gabor filters, a relationship can 
be calculated between the aspect ratio A, the orientation half-bandwidth 
A9i/2 and the spatial frequency bandwidth r in octaves (Daugman 1985b): 15 


Adi /2 = arcsin 



2 r — 1\ 

2 r + 1 J ‘ 


For the observed A = 1/2 and r = 1.5 octaves, the formula gives A ^/2 = 
13.8°, in good agreement with the observed 15°, and supporting the hypothe¬ 
sis that receptive field profiles are close to Gabor elementary functions. This 
is reinforced by calculating the area in Fourier space of the inferred filters, 
which is about 2.5 times the Gabor minimum, whereas other idealized 2D 
filters have areas of at least 6.5 times the minimum. 


15 That is, 2 r = f'/f, where / and f are focal frequencies of two filters; we are measuring 
bandwidth by a ratio rather than a difference. In this case r = 3/2. 
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6.6 Problems with the Gabor Representation 

One argument against the hypothesis that the vision system uses a 2D Gabor 
representation is that the Gabor elementary functions are not strictly local; 
that is, along with their Gaussian envelope, they stretch out to infinity. 
In mathematical terms, they have noncompact support. This is biologically 
implausible, since receptive fields are at least limited to the area of the retina, 
and in fact more limited than that. (Daugman notes that receptive fields die 
out after five or six extrema.) One answer to this argument is that the Gabor 
representation is intended as an idealized mathematical model, and that we 
shouldn’t expect it to be exactly realized in the biology. In any case, the 
Gaussian envelope is well localized: 99.7% of its area is within 3 standard 
deviations of the mean, 99.994% within 4 (see Figs. 6.11, 6.12, 6.14). A 
receptive held that is statistically indistinguishable from a Gabor function in 
97% of the cases is surely a good enough approximation to the mathematical 
ideal. 

A second argument against the Gabor representation is that it is nonortho- 
gonal. ie One result of this is that it is comparatively difficult to compute the 
coefficients of a 2D Gabor representation. For orthogonal representations, 
such as the Fourier representation and orthogonal wavelet representations 
(see the following section), the coefficients are computed by a simple inner 
product. In contrast, an algorithm for computing the coefficients of a ID 
Gabor representation was not published before 1980, and Daugman uses an 
iterative relaxation algorithm to compute the coefficients for the 2D Gabor 
representation (Daugman 1993). 

Further, Daugman claims that simulation studies have shown that nonor- 
thogonal representations can lead to various artifacts, including edge echos 
and spurious zero-crossings (Daugman 1993). The paradox is that, as Daug- 
man observes, nonorthogonal representations are ubiquitous in biological sen¬ 
sory and motor systems. Thus, whatever the disadvantages of nonorthogo¬ 
nality, nature seems to have found ways around them; we consider some of 
the possibilities in Section 6.9. 


16 Indeed, the 2D Gabor elementary functions (Eq. 6.15) do not even generate a frame, 
when Ax A u = 1 or Ay Av = 1 (Heil & Walnut 1989, pp. 656-657; Daubechies et al. 1986, 
p. 1274). They do generate a frame for certain values of A u < 1/Ax and Av < 1/A y, in 
particular, for A« = 1 / mAx, Av = 1 / nAy where m, n = 2,3,4,... (Daubechies et al. 
1986, p. 1275). See also Section 6.9. 
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Figure 6.18: Examples of dyadic dilates of a mother wavelet. The dyadic di¬ 
lates of a mother wavelet (j>(x) have the form 0( 2 k x) for positive and negative 
integers k. This figure shows the k = —1, 0,1 dilates (a, b, c, respectively). 


6.7 Gabor versus Wavelet Representations 

Wavelets have been proposed as an alternative to Gabor elementary functions 
as a basis for representation in the visual system. We limit ourselves to a 
brief introduction. 17 

A family of wavelets is a complete set of functions, all generated from a 
mother wavelet by the operations of dilation and translation. Most commonly 
we are concerned with dyadic dilations and translations: A dyadic dilate of 
a function (f> : M —» M is a function of the form 0( 2 k x), for some integer k. 
Thus 0(2~ 1 x) is cf) dilated (stretched) by a factor of two (around the origin), 
and 4>(2x) is contracted (shrunk) by a factor of two (around the origin); 
see Fig. 6.18. 18 

A dyadic translate of a dilated function has the form 0( 2 k x — j) for some 
integer j. The effect of the translation is clearer if we write the function in 
the equivalent form 0 [ 2 k (x — j/2 k )], since then we see that the dilate <fi(2 k x) 
has been translated to all the dyadic points j/2 k (Fig. 6.19). Thus the general 
form of the wavelets generated from mother wavelet (j) by dyadic dilation and 
translation is: 

4>jk(x) = 4>(2 k x - j), j, k e Z. (6.18) 

Figure 6.20 shows a well-known mother wavelet, the Haar wavelet. 

17 Several good overviews of wavelets have been published, including Daubechies (1988), 
Strang (1989) and Heil & Walnut (1989). Our exposition is based mostly on Strang (1989). 

18 Some authors define the dilation by 2 k ' 2 (f>(2 k x) so that its L 2 norm is the same as 
that of the mother wavelet. This is convenient if one is trying to construct an orthonormal 
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Figure 6.19: Examples of dyadic translates of a dyadic dilate of a mother 
wavelet. The mother wavelet generates the family of wavelets <fi(2 k — j) 
for all integers j and k. The first row of the figure depicts the k = 0 wavelets; 
they are centered on the integers j = 0, ±1, ±2,... The second row shows the 
k = 1 wavelets, centered on the half-integers j = 0, ±1/2, ±2/2, ±3/2,... The 
third row shows the k = 2 wavelets, centered on j = 0, ±1/4, ±2/4, ±3/4,... 



Figure 6.20: The Haar mother wavelet, which generates an orthogonal family 
of wavelets. A principal disadvantage of this wavelet is its discontinuity. 
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Next we make some observations about wavelets: 

1. Although we have discussed wavelets in terms of a one-dimensional 
mother wavelet, it should be clear that wavelets of higher dimension 
can be defined in the same way: 

<j> P *(x) = 0(2 A; x - p), (6.19) 

for (j) : M n —» M, p G TP and k G Z. Higher-dimensional wavelets are 
necessary to model vision. 

2. Since a wavelet family is by definition complete, any (finite energy) 
function can be represented by a (possibly infinite) linear superposition 
of wavelets: 

P ^ ^ CjkPjk- 

j,k 

This immediately raises the question of how the wavelet coefficients Cjk 
can be computed; we take it up later. 

3. Families of wavelets need not be orthogonal. Although the original 
definition of ‘wavelet’ implied orthogonality, the term is now generally 
used for any complete family generated by dilation and translation. We 
use the term orthogorial wavelet for the mother wavelet of an orthogonal 
family. (The Haar wavelets, which are based on the mother wavelet in 
Fig. 6.20, are orthogonal.) 

4. Although our pictures have suggested that wavelets are strictly local 
(i.e., of compact support), this is not necessarily the case. In fact, it 
is generally difficult to construct families of strictly local wavelets that 
are orthogonal, and the resulting basis functions tend to be irregular 
(Strang 1989, p. 615). 

Why use wavelets instead of other representations, such as the Fourier or 
Gabor transforms? One reason is that wavelets permit functions to be rep¬ 
resented as linear superpositions of strictly local elementary functions. This 
is especially important when the function to be represented is itself strictly 
local, since in this case a representation in terms of nonlocal elementary func¬ 
tions depends on enormous cancellation; think of the Fourier representation 
of a pulse. 


wavelet basis, but is unnecessary for our purposes here. 
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Figure 6.21: Mallat’s Tree Algorithm for wavelet expansion implemented as 
multilayer linear neural net. Matrix H represents a linear neural network 
implementing a high-pass filter to extract the wavelet coefficients b j at each 
level of resolution. Matrix L represents a linear neural network implementing 
a low-pass filter that passes a “blurred” image a., on to the next stage for 
processing. 


Representing strictly local functions in terms of strictly local functions 
makes sense, and wavelets are well suited to a strictly local representation. 
If a mother wavelet (j) is strictly local to an interval [— L/2,L/2] around the 
origin (i.e., its support is in this interval), then we can see that a wavelet 
representation of is a multiresolution 19 decomposition of the function: the 
wavelet coefficient Cjk gives information about ^ at a scale of L/2 k and in 
the region j/2 k . 

Coefficients are easy to compute if the wavelets are orthogonal. The 
simplest and most familiar way is via the inner product: 

Cjk (Vb (fijk )/|| (fijk || • 

There are also more efficient methods, such as Mallat’s Tree Algorithm (Mal- 
lat 1989b, 1989a). Here we note only that the wavelet coefficients can be 
computed by a simple multilayer linear neural network (Fig. 6.21). In this 
algorithm, matrix H is a high-pass filter that computes the wavelet coeffi¬ 
cients bj at resolution level 2 _J , and matrix L is a low-pass filter that passes 


19 For a review, see Daubechies (1988). 
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the “blurred” image on to the next stage. 


6.8 Gabor Wavelets 

There are some obvious similarities between wavelets and Gabor elementary 
functions: they are both complete families of functions, the members of 
which are predominantly sensitive to variations at a particular scale and at 
a particular location (in space or time). 20 Indeed, the Gabor functions can 
be generated from a Gaussian mother function by translation and periodic 
modulation, in the same way that wavelets are generated from a mother 
wavelet by translation and dilation (Heil & Walnut 1989). 

Daugman unifies Gabor elementary functions and wavelets by defining 
Gabor wavelets (Daugman 1993). These anisotropic (oriented) wavelets are 
generated from a fixed Gabor elementary function (Eq. 6.17) by dilation, 
translation and rotation. 21 Dilation, of course, also has the ancillary effect 
of changing the frequency of the Gabor function. This fits well with neuro¬ 
physiological and psychophysical data indicating a log-polar distribution of 
response selectivity in cells in the visual cortex, which show an orientation 
half-bandwidth of ±15° and a frequency bandwidth of 1.5 octaves (Daug¬ 
man 1993). That is, a space in which polar angle represents orientation and 
radial distance represents spatial frequency is efficiently covered by Gabor 
filters with aspect ratio A = 1/2, orientation a multiple of 30°, and central 
frequency at radii in the ratio 2 3 / 2 (Fig. 6.22). 


6.9 The Orthogonality Issue 

Of course, Gabor wavelets are not orthogonal, so their attractive match to 
the data is coupled with mathematical difficulties. But, orthogonality is a 
rather delicate property — functions either are or aren’t orthogonal; there 
are no degrees of orthogonality — and so it is probably too fragile for bi¬ 
ology to be able to depend on it. Perhaps we should not be surprised that 

20 Although the Gabor functions are not strictly local, their Gaussian envelope causes 
their greatest sensitivity to be concentrated near the center of that envelope. 

21 These are not true 2D wavelets, in the usual sense, which are generated from a mother 
wavelet by dilation and translation (Eq. 6.19). Thus true 2D Gabor wavelets would have 
the same orientation as the mother wavelet. 
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Figure 6.22: Log-polar distribution of 2D Gabor filters. Shaded ellipses rep¬ 
resent envelopes of 2D Gabor filters with aspect ratio A = 1/2. The Liters 
are oriented in multiples of A 6 = 30° and have focal frequencies f 0 , fi, f 2 , ■ ■ 
where /*. = d fc /o an d d = 2 3 / 2 . Notice how effectively ellipses of these orien¬ 
tations, sizes and aspect ratios cover the space. 
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nonorthogonality is ubiquitous in biological systems; rather we should learn 
how nature lives with it and even exploits it. 

The principal difficulty with a nonorthogonal set of elementary functions 
is in computing the wavelet coefficients. In other words, although we know 
that any finite-energy if can be represented by a linear superposition of 
the Gabor elementary functions, if = 'Yhjkim c jkim<fjkimi their nonorthogo¬ 
nality means that the coefficients are not defined by a simple inner product, 
Cjkim = (" 0 , <f>jkim) /W&jkimW- Daugman (1993) has described an iterative re¬ 
laxation algorithm for expanding an image in terms of Gabor wavelets or 
other nonorthogonal codes; it operates by gradient descent in the squared L 2 
error of representation: 


0 - Cjklm4*jklm 


j klm 


It is unlikely that such iterative algorithms are implemented in biological 
neural networks, since their speed is limited by the neuron impulse rate (say 
1 msec, per impulse, with many impulses required to represent an analog 
quantity). 22 On the other hand, iteration in local circuits in the dendritic net 
does not depend on impulse generation, and so could proceed much faster. 

Although the Gabor coefficients cannot be computed by inner products, 
the evidence from receptive held studies is that the primary visual system 
does compute inner products, so we must question their functional role. 23 
One possibility is that the inner products may be good estimates of the Gabor 
coefficients, and so a good place to start the relaxation process. (Daugman’s 
algorithm does this.) 

A second possibility we consider is that although the Gabor wavelets are 
not orthogonal, they may nevertheless be a frame (Heil & Walnut 1989), 

22 There is at most 1 KHz. of available bandwidth since that is the maximum spike rate. 
Therefore, to distinguish N discrete values, we need frequencies separated by at least 
A/ = 1000/A. Applying the Gabor Uncertainty Principle gives At = 7V/1000 seconds, or 
N milliseconds to reliably transmit the value. Thus it takes at least 10 msec, to transmit 
an analog value with one digit of precision and at least 100 msec, to transmit it with two 
digits of precision. 

23 Here, “primary visual system” refers to the retina, lateral geniculate nucleus and 
primary visual cortex (VI). Since the representational primitives of the retina + LGN 
system seem to be either radially symmetric differences of Gaussians or radially-synnnetric 
Gaussian sinusoids (Pribram 1991, p. 74), the Gabor coefficients must be computed from 
the coefficients of these radial basis functions. 
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which is a generalization of a basis. If the functions (ftjkim are a frame, then 
there is a bounded linear operator S such that 


^ ^ (U, S 0j kirn) kirn ^ ^ (U, 0jklm ) S (fjklm • 


jklm 


jklm 


In other words, the inner products (if, (fjkim ), which are apparantly computed 
by the primary visual system, give the representation of if in terms of the 
dual frame {S ,_ V jWm }. 

Now we must address the question of whether the Gabor wavelets are a 
frame. It has been known for some time that the ID Gabor wavelets are not 
a frame for Af At = 1, but Daubechies et al. (1986, p. 1275) show that they 
are a frame for At = 1 / mAf where m = 2,3,4,.. , 24 

Since a 2D Gabor wavelet is the outer product of two ID Gabor wavelets, 
4>jkim{x, y) = 4>ji(x)(j)km(y ), it is straight-forward to show that the 2D wavelets 
are a frame when Ax Au = 1/m and Ay Av = 1/n, where m, n = 2, 3,4,... 
These conditions are compatible with the constraints imposed by the Gabor 
Uncertainty Principle. For example, m — n — 13 gives Ax Au = Ay Av = 
1/13, which is slightly larger than the minimum 1 / An ~ 1/12.6. Further, 
they are consistent with Daugman’s (1984) observation that receptive fields 
occupy about 2.5 times the theoretical minimum area, since in the case 
m — n — 8 the functions occupy 167r 2 /64 « 2.47 times the minimum area. 

Finally, we observe that there is really no a priori reason for the visual 
cortex to compute the Gabor coefficients, because there is no need for it to 
reconstitute the input image if from the coefficients: 

('jklrnHjklin • 

jklm 


It must be remembered that this equation is only a mathematically con¬ 
venient way of guaranteeing that no information is lost in computing the 
coefficients. Since the visual cortex harbors no homunculus, it does not need 
to reconstruct the image, and it may work directly in terms of the inner 
products. 


24 These conditions are sufficient, but perhaps not necessary. Also note that for larger 
to the frame is tighter , which means S is more nearly a scalar. 
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6.10 3D Gabor Representation of Spatiotem- 
poral Signals 

Gabor’s research was motivated in part by the observation that our percep¬ 
tion of sound is simultaneously of duration and pitch, and therefore that an 
analysis of sound should be in terms of elements localized in both duration 
and frequency (Gabor 1946, pp. 431-432). Exactly the same argument may 
be made for vision. In the spatial domain we see simultaneously both ex¬ 
tent and texture (spatial frequency). Likewise, in the temporal domain we 
perceive simultaneously duration and motion (temporal frequency). 

Thus we see that the use of 2D Gabor elementary functions to model 
visual image representation is unrealistic in a significant way: it ignores the 
temporal structure of images. It is as though vision were merely a succession 
of separate images, each independent of the next. On the other hand, if we 
applied to visual images the ID Gabor functions (Section 6.3), we would cap¬ 
ture their temporal structure, but not their spatial structure, which Daugman 
and others have shown to be central to understanding vision. 

An obvious solution to this problem is to combine the two analyses and 
consider the evolution in time of two-dimensional spatial signals. Thus we 
will take the input to the visual cortex to be a three-dimensional signal 
^(x, y, t), ij) : M 3 —> M. Sometimes it will be more convenient to write ^(x, t ) 
where the vector x = ( x , y) represents spatial position. Such a signal has a 
Fourier transform T(£, ??, z'), where ( and r] are spatial frequencies and v is 
a temporal frequency. 

Having seen the ID and 2D Gabor Uncertainty Principles, it is perhaps 
hardly surprising that there is a 3D Uncertainty Principle holding between 
pairs of conjugate variables: 


Ax A( 

> 

l/4vr, 

Ay Ay 

> 

1/4:71, 

At Av 

> 

l/4vr, 


where we have defined the nominal spreads in terms of the standard devi¬ 
ations of the functions. For a proof of this uncertainty principle, see Sec. 
6 . 12 . 1 . 

It is also straight-forward to show (Sec. 6.12.2) that the Gabor inequalities 
become equalities for the 3D Gabor elementary functions, which have the 
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form: 


Iy pqruvw 


{x,y,t) 


exp < — 7 T 


(x - p ) 2 (y - qf (t-r) 21 

O ' /OO ' o 


a* (3 2 7 

exp { 2 ni[u{x — p) + v(y — q) + wit — r)]} . 


x 

( 6 . 20 ) 


The wave packet is localized around space-time coordinates ( p,q,r ); that is, 
it is centered at x — p, y — q in space and t — r in time. Its location 
in the corresponding frequency domain is given by (u,v,w), its two spatial 
frequencies and one temporal frequency. This is apparent from the Fourier 
transform of 0 : 


$pgruvw( C, V, v) = exp {~7T [(( - ufo? + (l] - vf fi 2 + (u - wf^ 2 ] } X 

exp { 2 ni[x(C, — u) + y(rj — v) + t(v — w)]} . ( 6 . 21 ) 

It can be shown (Sec. 6 . 12 . 2 ) that the standard deviations of cj) around the 
x, y and t axes are proportional to a, /3 and 7 , respectively; thus a, /3 and 
7 determine the wave packet’s shape. Conversely, the standard deviations of 
<F are proportional to a -1 , /3 -1 and 7 -1 . 

As before, the Gabor uncertainty relations permit signals to be localized 
in Fourier space no more accurately than a cell of size 

Ax Ay At A( Ay Av > 1 / 647T 3 . 


Indeed, the information cells can be no smaller than \ j An in each pair of con¬ 
jugate variables. Such cells are the information quanta for 3D signals (Fig. 
6.23). The elementary information cells can be indexed by sextuples of quan¬ 
tum numbers, three spatiotemporal and three spectral, m = (mi,m 2 ,m 3 ), 
n = (711,712,713), so that 

p = miAx, u = niA(, 
q = m 2 Ay , v = 71 2 At?, 
r = m 3 At, w = n 3 Au. 

Then the complex numbers c mn are the Gabor coefficients of i[} if and only if 


Cmn *Ppqr uvw ■ 


where the indices m and n have ranges appropriate to the spatiotemporal 
extent and bandwidth of Of course, ^ could be equally well represented 
by real coefficients and Gaussian sinusoids in quadrature phase. 
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t 



V 



Figure 6.23: Information cells defining a 3D signal. For clarity the three 
spatiotemporal dimensions are shown separate from the three spectral di¬ 
mensions, but it must be born in mind that each information cell is a six¬ 
dimensional rectangular space. Each such cell represents a quantum of in¬ 
formation in Fourier space. 
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Next we consider the significance to vision of this quantization of Fourier 
space. Although it might seem natural to interpret it as a mathematical 
fiction, Daugman’s research has already given us good reason to believe that 
the visual cortex is organized around a spatial Gabor representation. This 
suggests that we take the spatiotemporal Gabor representation quite literally 
and interpret the visual cortex as a bank of filters tuned to spatial frequency 
bands of width A( and Arj, a temporal frequency band of width Au, and 
localized to a spatial region of size Ax Ay. We further hypothesize that these 
filters accumulate information over an interval of time that is a small multiple 
of At, and produce the Gabor coefficients at the end of this interval (perhaps 
by relaxation during the next interval). 25 

ft is natural to identify this interval with the principal rhythm of the 
occipital (visual) cortex, the alpha rhythm. Slow rhythms, such as the alpha, 
seem to clock the generation of spike trains, just as we would expect if a 
set of rate-encoded Gabor coefficients were computed during each interval. 26 
During periods of greater activity the alpha rhythm “desynchronizes” and 
is replaced by a higher frequency oscillation (40-60 Hz.). The results of 
such a decrease in At include greater temporal resolution, poorer temporal- 
frequency resolution, and less accurate computation of the Gabor coefficients 
- all reasonable tradeoffs in situations demanding action. 

If the hypothesized correlation of At with the alpha rhythm is correct, 
then from the resting alpha frequency, 8 to 12 Hz., we can estimate the 
resting interval T a « 100 msec., with a range of perhaps 80 to 125 msec. 
Since At is the standard deviation of the Gabor elementary function, we can 
expect that T a must be 3 or 4 times At (so that T a contains 90-95% of the 
wave-packet). Since in Sec. 6.12.2 we show At = 7 / (Eq. 6.30), for 
mathematical convenience we estimate 


T a ~ 2y/nAt = 7 . 


This implies a resting temporal frequency resolution of 


A w a 


1 1 
In At 47 r(T a / 2y/n) 



« 2.8 Hz. 


25 We may compare the inhibitory wave that seems to reset cerebellar computation every 
500 msec. (Pribram 1991, p. 127). 

26 We refer here to Bland’s studies of the theta rhythm in the dentate gyrus of the rabbit 
(Bland et al. 1978), but the same principle applies to the alpha rhythm. 
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The Gabor representation also sheds light on the perception of form and 
motion. 2 ' The parameters u, v and w determine the orientation of the ele¬ 
mentary signal in space-time. For example, if w — 0 then the wave packet 
is perpendicular to the t-axis (Fig. 6.24) and we have the effect of a 2D 
(spatial) Gabor function, but localized in time. Conversely, if w 0 then 
the elementary function is inclined to the time and space axes (Fig. 6.25). 
We can see that such a filter would respond to a grating patch moving at 
a fixed velocity perpendicular to the fringes. We hypothesize that 3D Ga¬ 
bor elementary functions of this kind explain the response characteristics of 
complex and hypercomplex cells in the visual cortex, which have been shown 
to respond to moving bars and gratings. 

We can easily calculate the phase velocity at which the fringes move: 

v P = w/f , 

where / = ||u|| is the spatial frequency of the grating patch. The fringes 
move in a direction opposite to the (spatial) wave vector u (Fig. 6.25), so 
the velocity vector v of the fringes is —v p times the wave normal u/||u||: 

V = -u p u/||u|| = -wu// 2 . 

We consider briefly the case in which the Gabor function is parallel to 
the time axis (Fig. 6.26), that is, u = 0. Such a filter would respond to a 
uniform intensity (within its spatial receptive field) oscillating at a frequency 
w. We are unaware of research looking for cells with this kind of response, 
but it is interesting that much of the work on receptive fields has made use 
of flashing spots, and so might be consistent with the existence of such cells. 


6.11 Conclusions 

We have reviewed Gabor’s Uncertainty Principle and Daugman’s evidence 
for 2D Gabor filters in the visual cortex. We compared the Gabor repre¬ 
sentation with wavelet-based representations, and concluded that the Gabor 
representation is preferable. This is in spite of the Gabor functions not being 

27 0f course, so would other representations in terms of signals localized in both the 
space-time and spectral domains, such as other 3D Fourier transforms windowed in space- 
time, or 3D wavelets. 
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Figure 6.24: Slice through even (cosine) component of 3D Gabor elementary 
function oriented perpendicular (w = 0) to the time axis (ordinate). The 
abscissa is taken to be along the wave-vector u, and so perpendicular to the 
spatial wavefronts. Such a filter is selective for stationary spatial frequency 
/, localized in both space and time. This function has / = 1/2, a 2 + f3 2 = 
y 2 = 20, u = v = I/a/8, w = 0, p = 0 and r = 0. Lighter regions are more 
positive, darker more negative. 
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Figure 6.25: Slice through even (cosine) component of 3D Gabor elementary 
function inclined to the time axis (w ^ 0). Such a filter is selective for a 
spatial grating of frequency /, moving at velocity v p , and localized in both 
space and time. This function has a 2 + (3 2 = y 2 = 20, u = v = 1/4, 
w = l/v^j p = 0 and r = 0 . ft is selective for fringes of frequency / = 1/V8 
moving at a phase velocity v p = w/f = 1 to the left. 
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Figure 6.26: Slice through even (cosine) component of 3D Gabor elementary 
function oriented parallel to the time axis (u = 0). Such a filter is selective 
for a spatially uniform intensity oscillating at frequency w, localized in both 
space and time. This function has a 2 + /3 2 = y 2 = 20, u = v = 0, w = 1/2, 
p = 0 and r = 0. 
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orthogonal. Indeed we argued that a nonorthogonal set of elementary func¬ 
tions might be preferable in a biological context. Finally we argued that since 
vision must be understood in terms of images evolving in time, the appropri¬ 
ate representational primitives are 3D Gabor functions. We suggested that 
these functions could explain the selectivity for moving edges exhibited by 
complex and hypercomplex cells in visual cortex, and we suggested that the 
alpha rhythm may correspond to the interval at which the Gabor coefficients 
are computed. More concrete predictions will depend on finding empirical 
data to constrain the parameters of the Gabor elementary functions. 


6.12 Appendix: Proofs 

6.12.1 Proof of General Gabor Uncertainty Principle 

We prove a general Gabor Uncertainty Principle for n-dimensional func¬ 
tions . 28 Let 0 be a function and $ its Fourier transform; for convenience we 
assume ||0|| = ||<!>|| = 1 (this is just a matter of units). We will also assume 
that 0 decays to 0 at infinity; specifically we assume s k (f> 2 (s i,... ,s n ) —> 0 
for all k. For this proof we will not be able to use the simple definition of 
nominal spread that we used for ID signals; instead we define the nominal 
spread of a signal along the kth axis to be its standard deviation along that 
axis. Thus the spread along the kth axis is given by 

As k = \/Var fc {0}, 

where Var^ is the variance along the kth axis. Similarly, in the spectral 
domain we define, 

A a k = \/Var *.{$}. 

The variances are given by: 

Var fc { 0 } = ||s fc 0 || 2 = J s |sfc 0 | 2 ds = J s <ps 2 k (p* ds, 

Var fc {4>} = ||a fc 4 >|| 2 = J s |a fc $j 2 dcr = ,/ E 4>(r 2 4>*dcr, 

28 The proof is a generalization of that in Hamming (1989, pp. 181-184), which is based 
on Gabor (1946), which is in turn based on the Heisenberg-Weyl derivation of the uncer¬ 
tainty principle in physics. We have already proved a more general uncertainty principle 
(Prop. 5.2.22, p. 93), but the following proof is more informative in the specific application 
of Gabor functions. 
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where S = E = M n , s = (si,..., s n ) G S and a = (cri,..., cr n ) G E. Our goal 
will be to show 


Var fc {0} Var fc { < f>} > 


167T 2 


By the Schwartz inequality we know 


IMII S 


<90 


dsi 




( 6 . 22 ) 


where the bracketed expression on the right is an inner product. Since 
the Fourier transform is an isometry, it preserves the norm, so the norm 
of <90 / dsk is the same as the norm of its Fourier transform, which is 27ricq c < f ) . 
Therefore we can rewrite the left-hand side of Eq. 6.22 as follows: 


\\s k (j)\\ 2 \\d(j) / ds k \\ 2 = ||s fe 0|| 2 ||27ri(Tfc$|| 2 

= 47r 2 ||sfc0|| 2 ||<Tfc < f>|| 2 

= 47r 2 Var fc {0}Var fc {<f>}. 


(6.23) 


Now we work on the right-hand side of Eq. 6.22: 


, 50 N 

o / 

OSk , 


,50* 

SkV^—d s 
OSk 


'5-1 


Sfc0w— dsfcds 
OSk 


(6.24) 


where s' = (si,..., Sk~ i, s/c+i,..., s n ) G S — M. We apply integration by 
parts to the innermost integral (U = Sk4>, V = 0*): 

J s k( j>- dsfc Sk(j)d(j) 


= SkUT-^-J 0*d(s fc 0) 

= SfcMT*,"/ 0*^d0- J 0*0ds fc . 


By our assumption that s k 4> 2 —» 0 we know s00| 2 0^ = 0, so 



Sfc0M0 


0 *0ds fc 
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and so 


~ I (j)*<j)ds k . 


Substituting this into Eq. 6.24 yields 


A d<i> v 

S k ( Pt rj / 

OSk 


Js -1 

= -\ f <^ ds 

»y <s 

= - 1 / 2 , 


^ I (j)*(/)ds k j ds 


(6.25) 


since <f> is normalized (by assumption). Therefore, combining Eqs. 6.22, 6.23 
and 6.25, 

47 r 2 Var fc {0}Var fc {$} > 1/4, 

and so, 

Var fc {(/>}Var fc {<h} > 1 / 16vr 2 . 

We have proved the general Uncertainty Principle, 

A s k A o k > 1 / 47 t. 

6.12.2 Proof of Optimality of General Gabor Elemen¬ 
tary Functions 

Our task is to show that the n-dimensional Gabor elementary functions 
achieve the minimum area in 2n-dimensional Fourier space. Thus we must 
show 

|| 0||2 ||$||2 l 6 n 2 - 1 U - 

Without loss of generality we assume that (j) and <f> are centered at the origin 
(since this won’t alter their variance). Notice that both functions can be 
written as a cisoid (complex sinusoid) times a product of Gaussians: 


0(s) = exp(27ris • u) exp(-7rs 2 /a 2 ), 


(6.27) 


3 = 1 
n 


<h(cr) = exp(2/rip • cr) exp(—7rcr 2 o; 2 ) 

3 = 1 


(6.28) 
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When we compute the norms and variances of these functions the periodic 
parts can be ignored, since they have a constant modulus 1. Therefore, 


1101 



(6.29) 


The integrands are Gaussians, so to understand their structure better, rewrite 
them as normal distributions: 


exp(—2 tt s 2 Ja]) = % 


lJ_ 

y/2 \ V2^(aj / 2^/w) 1 [ 2 (a 2 / 4vr) 


exp 


The expression in curly braces is a normal distribution with mean = 0 and 
variance cr 2 = a 2 / 47T. Therefore rewrite it N a (sj): 


exp(-2vr s 2 /a 2 ) = ^=N a (s k ). 


Since N a is a probability distribution, f N a (sj)dsj = 1, so from Eq. 6.29, 
||0|| 2 is the product of the normalization factors ctj/V 2 : 


2 = n jaj 

2 n / 2 ’ 

which is the first formula we need. 

Next consider the variance of 0. It too can be rewritten, as a product of 
Gaussians and a quadratic factor: 



\ M \\ 2 = 


s k exp[-2n{s 1 /a 1 + • • • + s;/a n )]d Sl • • - d s. 


s t ex 


p(-2nsl/al)ds k x ]^[ / exp(-27rs^/a|)d 


s r 


These can be rewritten in terms of normal distributions: 

IMH 2 = J 4 N *(s k )ds k x JJ J N^s^dsj. 
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The first integral is the variance of the normal distribution, which we saw to 
be a 2 = a\ / 47r; the remaining integrals are again 1, so 


IMII 2 


a k TT a j 

n 

4vr 2 n / 2 ' 


Hence we see that the normalized variance of (j) around the A;th axis is 


K0II 2 = q| 2 n/2 n«i = 

||0|| 2 47 t 2 _n / 2 aj 4n 


Thus aj, is the standard deviation along the fcth axis, scaled by 2 v /7t: 


A s k = 



(6.30) 


Exactly the same analysis can be applied to the Fourier transform <f>, 
except that the variance of the normal distribution N a is a 2 = 1 / Ana 2 and 
the normalization factor is 1 / \f2a y Hence, 


4> 


ll^ll 2 


1 

2" /2 El, ’ 

1 1 
Ana 2 2^/2 ft. aj ' 


Hence, 

ll^ll 2 1 

|| <T> || 2 Anal’ 

and we see that the standard deviation along the fcth spectral axis is a^ 1 , 
scaled by 2y/n. 

Now the product of the variances in the space-time and spectral domains 
is easy to compute: 


IMIHK^II 2 a 2 1 1 

II 0|| 2 || <f>|| 2 An Anal 167T 2 


and we see that the n-dimensional Gabor elementary functions achieve the 
minimum area given by the uncertainty principle. 



